Properly Acting under Partial Observability with Action Feasibility Constraints
نویسندگان
چکیده
We introduce Action-Constrained Partially Observable Markov Decision Process (AC-POMDP), which arose from studying critical robotic applications with damaging actions. AC-POMDPs restrict the optimized policy to only apply feasible actions: each action is feasible in a subset of the state space, and the agent can observe the set of applicable actions in the current hidden state, in addition to standard observations. We present optimality equations for AC-POMDPs, which imply to operate on α-vectors defined over many different belief subspaces. We propose an algorithm named PreCondition Value Iteration (PCVI), which fully exploits this specific property of AC-POMDPs about α-vectors. We also designed a relaxed version of PCVI whose complexity is exponentially smaller than PCVI. Experimental results on POMDP robotic benchmarks with action feasibility constraints exhibit the benefits of explicitly exploiting the semantic richness of action-feasibility observations in AC-POMDPs over equivalent but unstructured POMDPs.
منابع مشابه
Reasoning about Strategies under Partial Observability and Fairness Constraints
A number of extensions exist for Alternating-time Temporal Logic; some of these mix strategies and partial observability but, to the best of our knowledge, no work provides a unified framework for strategies, partial observability and fairness constraints. In this paper we propose ATLKF po, a logic mixing strategies under partial observability and epistemic properties of agents in a system with...
متن کامل0 90 6 . 02 15 v 2 [ m at h . O C ] 1 7 Ju l 2 00 9 Computational Analysis of Control Systems Using Dynamic Optimization ∗
Several concepts on the measure of observability, reachability, and robustness are defined and illustrated for both linear and nonlinear control systems. Defined by using computational dynamic optimization, these concepts are applicable to a wide spectrum of problems. Some questions addressed include the observability based on userinformation, the determination of strong observability vs. weak ...
متن کاملun 2 00 9 Computational Analysis of Control Systems Using Dynamic Optimization ∗
Several concepts on the measure of observability, reachability, and robustness are defined and illustrated for both linear and nonlinear control systems. Defined by using computational dynamic optimization, these concepts are applicable to a wide spectrum of problems. Some questions addressed include the observability based on userinformation, the determination of strong observability vs. weak ...
متن کاملPlanning with Nondeterministic Actions and Sensing
Many planning problems involve nondeterministic actions actions whose effects are not completely determined by the state of the world before the action is executed. In this paper we consider the computational complexity of planning in domains where such actions are available. We give a formal model of nondeterministic actions and sensing, together with an action language for specifying planning...
متن کاملPolicy Learning with Hypothesis based Local Action Selection
For robots to be effective in human environments, they should be capable of successful task execution in unstructured environments. Of these, many task oriented manipulation behaviors executed by robots rely on model based grasping strategies and model based strategies require accurate object detection and pose estimation. Both these tasks are hard in human environment, since human environments...
متن کامل